# Load libraries at the top of the .Rmd
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.1     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readxl)
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
library(estimatr)
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

Empirical Analysis of Energy Markets - U6616

Empirical Exercise I - Data Exploration - Answer Key

Intro

This problem set is due on October 2. I strongly recommend to start working on the homework early. I would suggest to plan on completing the first two parts by September 20. You can work in pairs and submit a common solution. Please submit the homework as an R markdown file (if there are data files, they put all the files in a zip file). The code must run without errors. To make this easier, set the working directory at the beginning so it can be easily changed by someone else running the code.

The purpose of this dataset is to analyze how the capacity of different fuels used to generate electricity in the United States has changed over time. We will also look at how emissions of different pollutants have changed and think about potential links between the two.

Question 1

Start at the EIA Electric Power Annual.

First download Tables 4.2.A and 4.2.B., which contain capacity by source over time.

I do this using code, but you could have just downloaded the excel files normally and put them in your R working directory (you figure this out by writing getwd(), or opening a new project in a particular folder) to complete this part.

# Put the urls in a variable
urls <- c("https://www.eia.gov/electricity/annual/xls/epa_04_02_a.xlsx","https://www.eia.gov/electricity/annual/xls/epa_04_02_b.xlsx")

# Where do I want to store the data and outputs for the whole exercise?
# educational note: "." means current working directory, check your current working directory with getwd()
data_folder <- "data" 
output_folder <- "output"


download_file <- function(url){
  #' Function to download and unzip.
  # Input
  print(url)
  # file name
  print(basename(url))
  # concatenate the filename to your data storage
  fname <- file.path(data_folder, basename(url))
  print(fname)
  # If the file doesn't exist download it, otherwise just return where it's stored.
  if (!file.exists(fname)){
    download.file(url, fname)
  }
  return(fname)
}

# Create those folders if they don't exist
invisible(dir.create(file.path(".", data_folder), showWarnings = T))
## Warning in dir.create(file.path(".", data_folder), showWarnings = T): './data'
## already exists
invisible(dir.create(file.path(".", output_folder), showWarnings = T))
## Warning in dir.create(file.path(".", output_folder), showWarnings = T): './
## output' already exists
# Use the function
fnames <- c(download_file(urls[1]),download_file(urls[2]))
## [1] "https://www.eia.gov/electricity/annual/xls/epa_04_02_a.xlsx"
## [1] "epa_04_02_a.xlsx"
## [1] "data/epa_04_02_a.xlsx"
## [1] "https://www.eia.gov/electricity/annual/xls/epa_04_02_b.xlsx"
## [1] "epa_04_02_b.xlsx"
## [1] "data/epa_04_02_b.xlsx"

Load the files to R and describe each variable. Make sure to use the appropriate class for each variable.

  • Loading the files is easiest using the ‘import dataset’ function, or navigating to the file in the lower right hand side, finding and clicking the file and hitting import dataset from the drop-down menu.
  • By use the appropriate class, we mean that the data type of the variable should be appropriate for analysis.
  • Anticipating the future questions, you only needed to load the first 12 rows of each file.
  • You could describe each variable in the data-type sense with str(), or in the statistical sense with summary(). Bonus points given for doing both.
  • Using int or date are both acceptable for dealing with years (ints are easier, the advantage of dates is allowing you to deal with leap years, counting number of days across years and months, etc)
Describe each variable:

Year is self-described, each of the other columns is the capacity in MW for each energy source.

## Warning: NAs introduced by coercion
## Warning in eval(jsub, SDenv, parent.frame()): NAs introduced by coercion
## Classes 'data.table' and 'data.frame':   11 obs. of  12 variables:
##  $ Year                        : int  2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 ...
##  $ Coal                        : num  313322 314294 316800 317640 309680 ...
##  $ Petroleum                   : num  57445 56780 55647 51482 47167 ...
##  $ Natural Gas                 : num  397460 401272 407028 415191 422364 ...
##  $ Other Gases                 : num  1995 1932 2700 1934 1946 ...
##  $ Nuclear                     : num  100755 101004 101167 101419 101885 ...
##  $ Hydroelectric Conventional  : num  77930 78518 78825 78652 78738 ...
##  $ Other Renewable Sources     : num  38466 48552 53811 61221 77155 ...
##  $ Hydroelectric Pumped Storage: num  21858 22160 22199 22293 22368 ...
##  $ Other Energy Sources        : num  942 888 884 1420 1729 ...
##  $ Utility Total               : num  1010171 1025400 1039062 1051251 1063033 ...
##  $ Estimated Photovoltaic      : num  NA NA NA NA NA ...
##  - attr(*, ".internal.selfref")=<externalptr>
## Classes 'data.table' and 'data.frame':   11 obs. of  11 variables:
##  $ Year                                       : int  2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 ...
##  $ Wind                                       : num  24651 34296 39134 45676 59075 ...
##  $ Solar Photovoltaic                         : num  70.8 145.5 393.4 1052 2694.1 ...
##  $ Solar Thermal                              : num  465 473 473 472 476 ...
##  $ Wood and Wood-Derived Fuels                : num  6864 6939 7037 7076 7508 ...
##  $ Geothermal                                 : num  2229 2382 2405 2409 2592 ...
##  $ Other Biomass                              : num  4186 4316 4368 4536 4811 ...
##  $ 
## Total Utility (Other Renewable Sources): num  38466 48552 53811 61221 77155 ...
##  $ Estimated Small Scale Photovoltaic         : num  NA NA NA NA NA ...
##  $ Total Solar Photovoltaic                   : num  70.8 145.5 393.4 1052 2694.1 ...
##  $ Total Solar                                : num  536 618 866 1524 3170 ...
##  - attr(*, ".internal.selfref")=<externalptr>
##       Year           Coal          Petroleum      Natural Gas    
##  Min.   :2008   Min.   :242786   Min.   :32218   Min.   :397460  
##  1st Qu.:2010   1st Qu.:273170   1st Qu.:35606   1st Qu.:411110  
##  Median :2013   Median :303306   Median :43523   Median :425390  
##  Mean   :2013   Mean   :292710   Mean   :44538   Mean   :428487  
##  3rd Qu.:2016   3rd Qu.:313808   3rd Qu.:53564   3rd Qu.:443124  
##  Max.   :2018   Max.   :317640   Max.   :57445   Max.   :470237  
##                                                                  
##   Other Gases      Nuclear       Hydroelectric Conventional
##  Min.   :1914   Min.   : 98569   Min.   :77930             
##  1st Qu.:1940   1st Qu.: 99337   1st Qu.:78695             
##  Median :2108   Median : 99629   Median :79200             
##  Mean   :2219   Mean   :100122   Mean   :79162             
##  3rd Qu.:2479   3rd Qu.:101086   3rd Qu.:79736             
##  Max.   :2700   Max.   :101885   Max.   :79913             
##                                                            
##  Other Renewable Sources Hydroelectric Pumped Storage Other Energy Sources
##  Min.   : 38466          Min.   :21858                Min.   : 883.8      
##  1st Qu.: 57516          1st Qu.:22246                1st Qu.:1180.8      
##  Median : 82600          Median :22389                Median :1795.6      
##  Mean   : 86231          Mean   :22432                Mean   :1818.7      
##  3rd Qu.:111325          3rd Qu.:22677                3rd Qu.:2326.8      
##  Max.   :142474          Max.   :22830                Max.   :2886.3      
##                                                                           
##  Utility Total     Estimated Photovoltaic
##  Min.   :1010171   Min.   : 7327         
##  1st Qu.:1045156   1st Qu.: 9778         
##  Median :1063033   Median :12765         
##  Mean   :1057718   Mean   :13113         
##  3rd Qu.:1071378   3rd Qu.:16148         
##  Max.   :1094740   Max.   :19547         
##                    NA's   :6
##       Year           Wind       Solar Photovoltaic Solar Thermal   
##  Min.   :2008   Min.   :24651   Min.   :   70.8    Min.   : 464.8  
##  1st Qu.:2010   1st Qu.:42405   1st Qu.:  722.7    1st Qu.: 473.0  
##  Median :2013   Median :59973   Median : 5336.1    Median :1286.4  
##  Mean   :2013   Mean   :60265   Mean   : 9616.0    Mean   :1122.1  
##  3rd Qu.:2016   3rd Qu.:76930   3rd Qu.:16049.1    3rd Qu.:1757.9  
##  Max.   :2018   Max.   :94418   Max.   :30120.5    Max.   :1757.9  
##                                                                    
##  Wood and Wood-Derived Fuels   Geothermal   Other Biomass 
##  Min.   :6864                Min.   :2229   Min.   :4186  
##  1st Qu.:7057                1st Qu.:2407   1st Qu.:4452  
##  Median :8354                Median :2483   Median :5039  
##  Mean   :7962                Mean   :2466   Mean   :4801  
##  3rd Qu.:8763                3rd Qu.:2529   3rd Qu.:5107  
##  Max.   :8969                Max.   :2607   Max.   :5166  
##                                                           
##  \r\nTotal Utility (Other Renewable Sources) Estimated Small Scale Photovoltaic
##  Min.   : 38466                              Min.   : 7327                     
##  1st Qu.: 57516                              1st Qu.: 9778                     
##  Median : 82600                              Median :12765                     
##  Mean   : 86231                              Mean   :13113                     
##  3rd Qu.:111325                              3rd Qu.:16148                     
##  Max.   :142474                              Max.   :19547                     
##                                              NA's   :6                         
##  Total Solar Photovoltaic  Total Solar     
##  Min.   :   70.8          Min.   :  535.6  
##  1st Qu.:  722.7          1st Qu.: 1195.0  
##  Median : 5336.1          Median : 6622.5  
##  Mean   :15576.5          Mean   :16698.6  
##  3rd Qu.:27321.0          3rd Qu.:29078.8  
##  Max.   :49667.6          Max.   :51425.5  
## 

What is capacity? Make sure you understand the difference between capacity and generation.

Capacity The maximum electric power output (rate of energy generation, or energy over time) that can be output from a power plant.

These datasets use ‘Net Summer Capacity’, which is “the maximum output, commonly expressed in megawatts (MW), that generating equipment can supply to system load, as demonstrated by a multi-hour test, at the time of summer peak demand (period of June 1 through September 30.”

Generation is a measure of energy, not power (the derivative of energy over time), and is calculated with a formula like this:

\[\sum_i power_i*time_i \]

where each i refers to a different level of power generation that can be of any length. For example, if a 10MW natural gas plant produces its capacity (10MW) for 12 hours, half-capacity (5MW) for 6 hours, and is turned off for 6 hours, the generation is:

\[ 10MW*12h + 5MW * 6h + 0MW*6h = 150MWh.\]

Another term you hear relating these tho things is the capacity factor, which unintuitively is a ratio of actual generation over the total theoretical generation. Using the same example, the total theoretical annual generation of the 10MW natural gas plant is \(10MW *24hrs/day*365 days/year\), so if in a year the plant produces 40000MWh of energy, its capacity factor would be \[\frac{40000MWh}{10 MW * 24hrs/day*365 days/year} = 45.6\% \].

Plot total capacity over time. How has this changed?

Extra points for readable/converted units, and for x scale that doesn’t have .5s.

Total capacity increases over time, From 2008 - 2012, in increases by about 50 GW, and from 2015 to 2018 it increases by about 35 GW. It stays about constant from 2012-2016. The increases put together only constitute about 8% of the initial capacity.

# Make the plot GW for more readability!
plot1 <- ggplot(a , mapping = aes(x = Year, y = `Utility Total`/1000))+
  geom_line()+
  # Zoomed in version
  # limits set the bounds of the y axis
  scale_y_continuous(name = "Total Utility Capacity (GW)", limits = c(1000,1100)) + 
  # make every year labeled
  scale_x_continuous(n.breaks = 10) + 
  ggtitle("Total capacity over time 2008-2018")
ggsave("output/capacity2008-2018.png")
## Saving 7 x 5 in image
# Show the absolute value on the y axis so you don't exagerrate the difference.
# If you are using the same data, you can overwrite parameters for the plot this way:
plot2 <- plot1 + scale_y_continuous(name = "Total Utility Capacity (GW)", limits = c(0,1100))
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
ggsave("output/capacity_absolute2008-2018.png")
## Saving 7 x 5 in image
plot2

Plot the share of capacity from the different sources over time. How has this changed?

It’s really important here to realize that we ask you to use both tables, and that means you don’t want to double count. High-level algorithm: 1. Join tables 2. Remove double-counted columns 3. Plot share using a stacked bar or area graph

It’s nice to use intuitive colors for fuels.

# These aren't necessary because dt is smart, but you should set keys before joins to build good habits
setkey(a, Year)
setkey(b, Year)
# Join
dt1 <- merge(a,b)
dt1
# Remove double counted columns and totals
# Print possible columns
# names(dt1)


# List of sources you should remove
remove <- c("Utility Total","Estimated Photovoltaic","Other Renewable Sources","\r\nTotal Utility (Other Renewable Sources)","Total Solar Photovoltaic","Total Solar")


# Syntax tip
# DT[,Cols.Chosen:=NULL] Delete the column with column name Cols.chosen
# DT[,(Cols.Chosen):=NULL] Delete the columns specified in the variable Cols.chosen

# Remove these columns
dt1[,(remove) := NULL]

# Melt the columns down to plot
dt1 <- melt(dt1, "Year")
dt1
# You might get this error:     Removed 6 rows containing missing values (position_stack),
# that's because there's no data for Estimated Small Scale Photovoltaic before 2014. 
dt1[is.na(value)]
# You can reasonably assume it's 0
dt1[is.na(value), value := 0]

# Plot
# There's a color cheat sheet here, or you can use hex codes if you know them 
# http://sape.inf.usi.ch/quick-reference/ggplot2/colour
ggplot(dt1, aes(x = Year, y = value/1000, fill = variable)) + 
  geom_area() +  
  scale_x_continuous(n.breaks= 10) + 
  scale_y_continuous(name = "Capacity (GW)") +
  scale_fill_manual(name = "Energy Source", values=c("chocolate4", "black", "gray79","gray22", 'lightgoldenrod','cornflowerblue',"cadetblue1", "coral4", 'aliceblue','gold', "orange","tan4", 'brown2', "seagreen", "gold2"))

ggsave(file.path(output_folder, 'part1_capacity.pdf'), width = 16, height = 9)

Question 2

Now we will obtain data on emissions over time. For this, follow this link to the EIA Monthly Energy Review. Section 11 contains data on many emissions sources, but just download the data set on carbon emissions from energy consumption in the electric power sector (11.6).

There was some confusion about this filetype - T11.06 is in fact a csv/excel file depending on which version you downloaded. But it didn’t have the .xls or .csv at the end of the file, so some detective work was needed to figure this out. Either loading it directly, or changing the filename to add .csv or .xls at the end was necessary.

Load the files to R and describe each variable. Make sure to use the appropriate class for each variable.

This question needs either a datatype description or a statistical description. YYYYMM could be int, could be parsed to a date, or split into two ints.

 ##################
# Emissions by year

# Download the file using the function defined in q1.
fname <- download_file("https://www.eia.gov/totalenergy/data/browser/csv.php?tbl=T11.06")
## [1] "https://www.eia.gov/totalenergy/data/browser/csv.php?tbl=T11.06"
## [1] "csv.php?tbl=T11.06"
## [1] "data/csv.php?tbl=T11.06"
# Load a csv
em <-  fread(fname)

# data type and statistical description
str(em)
## Classes 'data.table' and 'data.frame':   5553 obs. of  6 variables:
##  $ MSN         : chr  "CLEIEUS" "CLEIEUS" "CLEIEUS" "CLEIEUS" ...
##  $ YYYYMM      : int  197301 197302 197303 197304 197305 197306 197307 197308 197309 197310 ...
##  $ Value       : chr  "72.076" "64.442" "64.084" "60.842" ...
##  $ Column_Order: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Description : chr  "Coal Electric Power Sector CO2 Emissions" "Coal Electric Power Sector CO2 Emissions" "Coal Electric Power Sector CO2 Emissions" "Coal Electric Power Sector CO2 Emissions" ...
##  $ Unit        : chr  "Million Metric Tons of Carbon Dioxide" "Million Metric Tons of Carbon Dioxide" "Million Metric Tons of Carbon Dioxide" "Million Metric Tons of Carbon Dioxide" ...
##  - attr(*, ".internal.selfref")=<externalptr>
summary(em)
##      MSN                YYYYMM          Value            Column_Order
##  Length:5553        Min.   :197301   Length:5553        Min.   :1    
##  Class :character   1st Qu.:198412   Class :character   1st Qu.:3    
##  Mode  :character   Median :199610   Mode  :character   Median :5    
##                     Mean   :199630                      Mean   :5    
##                     3rd Qu.:200808                      3rd Qu.:7    
##                     Max.   :202006                      Max.   :9    
##  Description            Unit          
##  Length:5553        Length:5553       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 
em
# Get the year and month by getting the quotient and remainder after dividing by 100
em[, `:=`(year = YYYYMM%/%100, month =  YYYYMM%%100)]
# Get the annual data which is coded as the 13th month in this dataset
em <-  em[!grepl('total',Description, ignore.case = TRUE) & month==13]
# Check everything is in the same units, convert to factor
em[,c('Unit', 'Description','Value'):= list(as.factor(Unit),as.factor(Description),as.numeric(Value))]
## Warning in eval(jsub, SDenv, parent.frame()): NAs introduced by coercion
# Make missing values 0
em[is.na(Value), Value := 0]

Plot total emissions over time. How have they changed?

Total Emissions from the electricity grid have risen slowly by about 1 billion metric tons since the 1970s until a peak in 2005. Then it falls back to the 1970s levels in 2020 ‘decarbonization’. Note that this isn’t total energy in the US, but just power plants. The transport sector has not decarbonized by as much.

The difference between this and the next question is to just to a line graph and a stacked area/bar chart separately. Bonus points for relabelling axes to billion metric tons, marks for title, axes labels.

setkey(em, year)
em_total <- em[,.(`Total Emissions`=sum(Value)),by = key(em)]
ggplot(em_total, aes(x = year, y = `Total Emissions`/1000)) + 
  geom_line() + 
  scale_y_continuous(name = "Total Emissions (Billion Metric Tons of CO2)", limits = c(0,3)) + 
  ggtitle('2b) Emissions over time 1973-2017')

Plot the share of emissions from the different sources over time. How has this changed? How do you relate this to what you observed in the previous question?

Since 2005, natural gas has rapidly increased while coal has decreased. This has meant that electricity generation has substituted coal emissions for natural gas emissions. We can’t tell whether coal plants have totally been replaced or not yet, that’s part 3 - is the drop in emissions due to coal being replaced by natural gas? or renewables? What proportion?

# Plot emissions over time by technology of generator
ggplot(em, aes(x= year, y = Value/1000, fill = Description))+
  geom_area(position = 'stack') +
  ggtitle("2c) United States CO2 Emissions from Power Plants by Technology and Year", subtitle = "source: EIA-860 data") +
  ylab("Billion Metric Tons of CO2") +
  scale_fill_manual(name = "Power Plant Technology", values=c("chocolate4", "black", 'brown2',"gray79","coral4","chocolate4","black","gray22"))

ggsave(file.path(output_folder, 'emissions.pdf'), width = 16, height = 9)

Question 3

Since the previous dataset only had capacity for 10 years, the next step is to download data from EIA Form 860 to build capacity for a period as long as the one for emissions.

Follow the steps covered during recitation to build the total capacity of source for each year.

These are just the steps from recitation, I end up showing students part 3a-c.

Download and unzip file
# This dataset is no longer the early release version!
url <- "https://www.eia.gov/electricity/data/eia860/xls/eia8602019.zip"

# I want a variable for the name of my data folder
data_folder <-  "data"
# Create the folde if it doesn't exist
dir.create(file.path(".", data_folder), showWarnings = T)
## Warning in dir.create(file.path(".", data_folder), showWarnings = T): './data'
## already exists
# Get the path of the file when it is stored on disk
fname <- file.path(data_folder, basename(url))
fname
## [1] "data/eia8602019.zip"
# If the file doesn't exist already, download from the URL (the web) to the path (your computer's folder called data, relative to your working directory)
if (!file.exists(fname)){
  download.file(url, fname)
}
# Unzip the folder 
unzip(fname, exdir = data_folder)
Look and examine the data

  • What is the difference between 1, 2 and 3?

the relevant data is in 3

  • What is the difference between 3.1 and 3.2…

“Additional Details” are in 3_2+, these we don’t need because the capacity, technology, and the operating and retirement years are in 3_1.

  • Look at the data we want, is it sufficient to answer the question?

…build capacity for a period as long as the one for emissions

Assumption

We don’t need any more data. Assume:

  • All power plants in ‘Operable’ are still ‘active’ and have been active since “Operating Year”
  • All power plants in ‘Retired and Cancelled’ were ‘active’ from “Operating Year” until “Retirement Year”, and that this dataset is exhaustive.
  • Power plant capacity in any given year is the total capacity of ‘active’ plants.
2 methods
  • Have a column with active status…
  • Keep a running tally for each fuel and year.
  • What is the minimum dataset we need for each method?
High-level algorithm for the running tally method

My ideal dataset has a cumulative capacity for each technology-year pair.

  1. For all those power plants are operable, we assume they are active up until the current year, so they never go offline.
  2. For all those power plants in ‘Retired and Cancelled’, we assume they are active between their operating and retirement years; if there’s no data, we just assume they are cancelled and never operated.
  3. Get the sum of Power Plant Capacity that came online in each year from the Operating Year column - ‘additions’ - by technology.
  4. Get the sum of Power Plant Capacity that was retired/came offline from the Retirement Year column - ‘subtractions’ - in each year by technology.
  5. The ‘net capacity change’ for each technology-year pair is additions less subtractions.
  6. Running a cumulative sum over the years by technology will give me the total capacity by technology in each year.
Join vs Concatenate?
  • Concatenating (rbindlist) is stacking two datasets
  • Joining (merge, join) is putting two datsets next to each other
  • Which do we want to do here?

Concatenate

  • Which columns are important to do correctly?

The minimum dataset has Operating Year, Retirement Year, Nameplate Capacity (MW), and Technology.

# Read in the data after unzipping
fname <- file.path(data_folder, "3_1_Generator_Y2019.xlsx")

# Import the excel sheet using Import Dataset in RStudio.
operable <-  as.data.table(read_excel(fname, sheet = "Operable", skip = 1, na = 'NA'))
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S1114 / R1114C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S1115 / R1115C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S1747 / R1747C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S1748 / R1748C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S2432 / R2432C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S2433 / R2433C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S2434 / R2434C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S2435 / R2435C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S2436 / R2436C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S3735 / R3735C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S3736 / R3736C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S3737 / R3737C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6093 / R6093C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6094 / R6094C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6095 / R6095C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6097 / R6097C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6098 / R6098C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6099 / R6099C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6100 / R6100C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6101 / R6101C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6102 / R6102C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6103 / R6103C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6436 / R6436C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6437 / R6437C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6439 / R6439C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6440 / R6440C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6699 / R6699C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S6700 / R6700C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S7058 / R7058C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S7272 / R7272C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8351 / R8351C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8352 / R8352C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8353 / R8353C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8354 / R8354C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8355 / R8355C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8821 / R8821C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8845 / R8845C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S8905 / R8905C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S9386 / R9386C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S9388 / R9388C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S9389 / R9389C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S9422 / R9422C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10004 / R10004C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10005 / R10005C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10111 / R10111C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10113 / R10113C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10196 / R10196C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10382 / R10382C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10383 / R10383C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10384 / R10384C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10385 / R10385C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10386 / R10386C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10387 / R10387C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10498 / R10498C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S10679 / R10679C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S11563 / R11563C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S11564 / R11564C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S11565 / R11565C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S11566 / R11566C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S11567 / R11567C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S11840 / R11840C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S11841 / R11841C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12415 / R12415C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12416 / R12416C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12419 / R12419C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12517 / R12517C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12518 / R12518C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12520 / R12520C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12521 / R12521C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12604 / R12604C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12605 / R12605C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12606 / R12606C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12607 / R12607C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12608 / R12608C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12609 / R12609C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12610 / R12610C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12611 / R12611C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12903 / R12903C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12904 / R12904C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12905 / R12905C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12906 / R12906C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12907 / R12907C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12908 / R12908C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12909 / R12909C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12910 / R12910C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S12999 / R12999C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13000 / R13000C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13049 / R13049C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13050 / R13050C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13051 / R13051C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13052 / R13052C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13063 / R13063C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13064 / R13064C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13065 / R13065C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13228 / R13228C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13229 / R13229C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13276 / R13276C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13277 / R13277C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13278 / R13278C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S13279 / R13279C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S14206 / R14206C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S14961 / R14961C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S14962 / R14962C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S14963 / R14963C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S14964 / R14964C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S14965 / R14965C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S14966 / R14966C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S15510 / R15510C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S15511 / R15511C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S16425 / R16425C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S16426 / R16426C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S17466 / R17466C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S17602 / R17602C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S18166 / R18166C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S19015 / R19015C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S19299 / R19299C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S19709 / R19709C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S19711 / R19711C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S19712 / R19712C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S19820 / R19820C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S19987 / R19987C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S20327 / R20327C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S20328 / R20328C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S20468 / R20468C19: ''
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Coercing text to numeric in S20624 / R20624C19: ''
# Look at structure
str(operable)
## Classes 'data.table' and 'data.frame':   22731 obs. of  73 variables:
##  $ Utility ID                                                             : num  63560 63560 63560 63560 63560 ...
##  $ Utility Name                                                           : chr  "TDX Sand Point Generating, LLC" "TDX Sand Point Generating, LLC" "TDX Sand Point Generating, LLC" "TDX Sand Point Generating, LLC" ...
##  $ Plant Code                                                             : num  1 1 1 1 1 1 2 3 3 3 ...
##  $ Plant Name                                                             : chr  "Sand Point" "Sand Point" "Sand Point" "Sand Point" ...
##  $ State                                                                  : chr  "AK" "AK" "AK" "AK" ...
##  $ County                                                                 : chr  "Aleutians East" "Aleutians East" "Aleutians East" "Aleutians East" ...
##  $ Generator ID                                                           : chr  "1" "2" "3" "5" ...
##  $ Technology                                                             : chr  "Petroleum Liquids" "Petroleum Liquids" "Petroleum Liquids" "Petroleum Liquids" ...
##  $ Prime Mover                                                            : chr  "IC" "IC" "IC" "IC" ...
##  $ Unit Code                                                              : chr  NA NA NA NA ...
##  $ Ownership                                                              : chr  "S" "S" "S" "S" ...
##  $ Duct Burners                                                           : chr  "X" "X" "X" "X" ...
##  $ Can Bypass Heat Recovery Steam Generator?                              : chr  "X" "X" "X" "X" ...
##  $ RTO/ISO LMP Node Designation                                           : chr  NA NA NA NA ...
##  $ RTO/ISO Location Designation for Reporting Wholesale Sales Data to FERC: chr  NA NA NA NA ...
##  $ Nameplate Capacity (MW)                                                : num  0.9 0.9 0.5 0.7 0.5 ...
##  $ Nameplate Power Factor                                                 : chr  "0.8" "0.8" "0.8" "0.8" ...
##  $ Summer Capacity (MW)                                                   : chr  "0.4" "0.3" "0.3" "0.4" ...
##  $ Winter Capacity (MW)                                                   : num  0.4 0.3 0.3 0.3 0.1 0.3 53 80 80 362 ...
##  $ Minimum Load (MW)                                                      : chr  "0.4" "0.3" "0.3" "0.3" ...
##  $ Uprate or Derate Completed During Year                                 : chr  "N" "N" "N" "N" ...
##  $ Month Uprate or Derate Completed                                       : chr  NA NA NA NA ...
##  $ Year Uprate or Derate Completed                                        : chr  NA NA NA NA ...
##  $ Status                                                                 : chr  "SB" "OP" "OA" "OA" ...
##  $ Synchronized to Transmission Grid                                      : chr  "Y" "X" "X" "X" ...
##  $ Operating Month                                                        : num  12 12 12 12 10 10 7 2 7 12 ...
##  $ Operating Year                                                         : num  2000 2000 2010 2000 2011 ...
##  $ Planned Retirement Month                                               : chr  NA NA NA NA ...
##  $ Planned Retirement Year                                                : chr  NA NA NA NA ...
##  $ Associated with Combined Heat and Power System                         : chr  "N" "N" "N" "N" ...
##  $ Sector Name                                                            : chr  "IPP Non-CHP" "IPP Non-CHP" "IPP Non-CHP" "IPP Non-CHP" ...
##  $ Sector                                                                 : num  2 2 2 2 2 2 1 1 1 1 ...
##  $ Topping or Bottoming                                                   : chr  "X" "X" "X" "X" ...
##  $ Energy Source 1                                                        : chr  "DFO" "DFO" "DFO" "DFO" ...
##  $ Energy Source 2                                                        : chr  NA NA NA NA ...
##  $ Energy Source 3                                                        : chr  NA NA NA NA ...
##  $ Energy Source 4                                                        : chr  NA NA NA NA ...
##  $ Energy Source 5                                                        : chr  NA NA NA NA ...
##  $ Energy Source 6                                                        : chr  NA NA NA NA ...
##  $ Startup Source 1                                                       : chr  NA NA NA NA ...
##  $ Startup Source 2                                                       : chr  NA NA NA NA ...
##  $ Startup Source 3                                                       : chr  NA NA NA NA ...
##  $ Startup Source 4                                                       : chr  NA NA NA NA ...
##  $ Solid Fuel Gasification System?                                        : chr  NA NA NA NA ...
##  $ Carbon Capture Technology?                                             : chr  NA NA NA NA ...
##  $ Turbines or Hydrokinetic Buoys                                         : chr  NA NA NA NA ...
##  $ Time from Cold Shutdown to Full Load                                   : chr  "10M" "10M" "10M" "10M" ...
##  $ Fluidized Bed Technology?                                              : chr  NA NA NA NA ...
##  $ Pulverized Coal Technology?                                            : chr  NA NA NA NA ...
##  $ Stoker Technology?                                                     : chr  NA NA NA NA ...
##  $ Other Combustion Technology?                                           : chr  NA NA NA NA ...
##  $ Subcritical Technology?                                                : chr  NA NA NA NA ...
##  $ Supercritical Technology?                                              : chr  NA NA NA NA ...
##  $ Ultrasupercritical Technology?                                         : chr  NA NA NA NA ...
##  $ Planned Net Summer Capacity Uprate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Net Winter Capacity Uprate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Uprate Month                                                   : chr  NA NA NA NA ...
##  $ Planned Uprate Year                                                    : chr  NA NA NA NA ...
##  $ Planned Net Summer Capacity Derate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Net Winter Capacity Derate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Derate Month                                                   : chr  NA NA NA NA ...
##  $ Planned Derate Year                                                    : chr  NA NA NA NA ...
##  $ Planned New Prime Mover                                                : chr  NA NA NA NA ...
##  $ Planned Energy Source 1                                                : chr  NA NA NA NA ...
##  $ Planned New Nameplate Capacity (MW)                                    : chr  NA NA NA NA ...
##  $ Planned Repower Month                                                  : chr  NA NA NA NA ...
##  $ Planned Repower Year                                                   : chr  NA NA NA NA ...
##  $ Other Planned Modifications?                                           : chr  NA NA NA NA ...
##  $ Other Modifications Month                                              : chr  NA NA NA NA ...
##  $ Other Modifications Year                                               : chr  NA NA NA NA ...
##  $ Multiple Fuels?                                                        : chr  "N" "N" "N" "N" ...
##  $ Cofire Fuels?                                                          : chr  NA NA NA NA ...
##  $ Switch Between Oil and Natural Gas?                                    : chr  NA NA NA NA ...
##  - attr(*, ".internal.selfref")=<externalptr>
# View the table
head(operable)
# Count how many nas there are
sum(is.na(operable$`Operating Year`)) # base r version
## [1] 0
operable[,sum(is.na(`Operating Year`))] # data.table version, both work, this is better
## [1] 0
# View the rows for which there  there's no operating year. 
operable[is.na(`Operating Year`),] 
# Get the second sheet
retired <-  as.data.table(read_excel(fname, sheet = "Retired and Canceled", skip = 1, na = 'NA'))
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in A4631 / R4631C1: got 'NOTE: The list only includes those
## retired generators which were reported in the most current data cycle. This is
## not a comprehensive list. Starting with March 2017 data, Preliminary Monthly
## Electric Generator Inventory table (https://www.eia.gov/electricity/data/
## eia860m/) includes a comprehensive list of generators which retired since 2002.
## The list can be found on the 'Retired' tab of the data file.'
str(retired)
## Classes 'data.table' and 'data.frame':   4629 obs. of  57 variables:
##  $ Utility ID                                                             : num  195 195 195 195 195 ...
##  $ Utility Name                                                           : chr  "Alabama Power Co" "Alabama Power Co" "Alabama Power Co" "Alabama Power Co" ...
##  $ Plant Code                                                             : num  3 8 8 8 8 8 47 47 47 47 ...
##  $ Plant Name                                                             : chr  "Barry" "Gorgas" "Gorgas" "Gorgas" ...
##  $ State                                                                  : chr  "AL" "AL" "AL" "AL" ...
##  $ County                                                                 : chr  "Mobile" "Walker" "Walker" "Walker" ...
##  $ Generator ID                                                           : chr  "3" "10" "6" "7" ...
##  $ Technology                                                             : chr  "Conventional Steam Coal" "Conventional Steam Coal" "Conventional Steam Coal" "Conventional Steam Coal" ...
##  $ Prime Mover                                                            : chr  "ST" "ST" "ST" "ST" ...
##  $ Unit Code                                                              : chr  NA NA NA NA ...
##  $ Ownership                                                              : chr  "S" "S" "S" "S" ...
##  $ Duct Burners                                                           : chr  "X" "X" "X" "X" ...
##  $ Can Bypass Heat Recovery Steam Generator?                              : chr  "X" "X" "X" "X" ...
##  $ RTO/ISO LMP Node Designation                                           : chr  NA NA NA NA ...
##  $ RTO/ISO Location Designation for Reporting Wholesale Sales Data to FERC: chr  NA NA NA NA ...
##  $ Nameplate Capacity (MW)                                                : chr  "272" "788.8" "125" "125" ...
##  $ Nameplate Power Factor                                                 : chr  "0.85" "0.85" "0.85" "0.85" ...
##  $ Summer Capacity (MW)                                                   : chr  "249" "727.7" "103" "104" ...
##  $ Winter Capacity (MW)                                                   : chr  "249" "727.7" "103" "104" ...
##  $ Minimum Load (MW)                                                      : chr  "130" "600" "50" "50" ...
##  $ Uprate or Derate Completed During Year                                 : chr  "N" "N" "N" "N" ...
##  $ Month Uprate or Derate Completed                                       : chr  NA NA NA NA ...
##  $ Year Uprate or Derate Completed                                        : chr  NA NA NA NA ...
##  $ Status                                                                 : chr  "RE" "RE" "RE" "RE" ...
##  $ Synchronized to Transmission Grid                                      : chr  "X" "X" "X" "X" ...
##  $ Operating Month                                                        : chr  "7" "10" "4" "7" ...
##  $ Operating Year                                                         : chr  "1959" "1972" "1951" "1952" ...
##  $ Retirement Month                                                       : chr  "8" "4" "8" "8" ...
##  $ Retirement Year                                                        : chr  "2015" "2019" "2015" "2015" ...
##  $ Associated with Combined Heat and Power System                         : chr  "N" "N" "N" "N" ...
##  $ Sector Name                                                            : chr  "Electric Utility" "Electric Utility" "Electric Utility" "Electric Utility" ...
##  $ Sector                                                                 : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Topping or Bottoming                                                   : chr  "X" "X" "X" "X" ...
##  $ Energy Source 1                                                        : chr  "BIT" "BIT" "BIT" "BIT" ...
##  $ Energy Source 2                                                        : chr  NA NA NA NA ...
##  $ Energy Source 3                                                        : chr  NA NA NA NA ...
##  $ Energy Source 4                                                        : chr  NA NA NA NA ...
##  $ Energy Source 5                                                        : chr  NA NA NA NA ...
##  $ Energy Source 6                                                        : chr  NA NA NA NA ...
##  $ Startup Source 1                                                       : chr  "NG" "DFO" "DFO" "DFO" ...
##  $ Startup Source 2                                                       : chr  NA NA NA NA ...
##  $ Startup Source 3                                                       : chr  NA NA NA NA ...
##  $ Startup Source 4                                                       : chr  NA NA NA NA ...
##  $ Solid Fuel Gasification System?                                        : chr  NA "N" "N" "N" ...
##  $ Carbon Capture Technology?                                             : chr  NA NA NA NA ...
##  $ Turbines or Hydrokinetic Buoys                                         : chr  NA NA NA NA ...
##  $ Time from Cold Shutdown to Full Load                                   : chr  "OVER" "OVER" "OVER" "OVER" ...
##  $ Fluidized Bed Technology?                                              : chr  NA NA NA NA ...
##  $ Pulverized Coal Technology?                                            : chr  "Y" "Y" "Y" "Y" ...
##  $ Stoker Technology?                                                     : chr  NA NA NA NA ...
##  $ Other Combustion Technology?                                           : chr  NA NA NA NA ...
##  $ Subcritical Technology?                                                : chr  "Y" NA "Y" "Y" ...
##  $ Supercritical Technology?                                              : chr  NA "Y" NA NA ...
##  $ Ultrasupercritical Technology?                                         : chr  NA NA NA NA ...
##  $ Multiple Fuels?                                                        : chr  NA "N" NA NA ...
##  $ Cofire Fuels?                                                          : chr  NA NA NA NA ...
##  $ Switch Between Oil and Natural Gas?                                    : chr  NA NA NA NA ...
##  - attr(*, ".internal.selfref")=<externalptr>
# Count how many nas there are using the data table method
retired[,sum(is.na(`Operating Year`))] # These are all cancelled plants
## [1] 838
retired[,sum(is.na(`Retirement Year`))] # there are 2 retired plants that don't have a retirement year, why? You could assume these retired in 2019, but I don't in this sketch.
## [1] 839
# Try binding

# Look up ?rbindlist if you haven't already
# The first parameter is list
# use.names makes you match columns based on their names instead of column order
# Fill is to fill 'retirement Year' in the operable sheet with NAs.
# idcol provides a column in df (for dataframe) with the list-position of the dataset it came from.  i.e. if you call df$.id it'll show 1s and 2s. 1s mean it comes from operable, and 2s mean that it came from retired.
df <-  rbindlist(list(operable, retired), use.names = T, fill =  T, idcol = T)
# Same checks for nulls, note how these numbers change
df[,sum(is.na(`Operating Year`))]
## [1] 838
df[,sum(is.na(`Retirement Year`))] # Many nulls because none of the operable plants have been retired yet!
## [1] 23570
df<- df[!is.na(`Operating Year`)] # filter only those with an operating year
str(df)
## Classes 'data.table' and 'data.frame':   26522 obs. of  76 variables:
##  $ .id                                                                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Utility ID                                                             : num  63560 63560 63560 63560 63560 ...
##  $ Utility Name                                                           : chr  "TDX Sand Point Generating, LLC" "TDX Sand Point Generating, LLC" "TDX Sand Point Generating, LLC" "TDX Sand Point Generating, LLC" ...
##  $ Plant Code                                                             : num  1 1 1 1 1 1 2 3 3 3 ...
##  $ Plant Name                                                             : chr  "Sand Point" "Sand Point" "Sand Point" "Sand Point" ...
##  $ State                                                                  : chr  "AK" "AK" "AK" "AK" ...
##  $ County                                                                 : chr  "Aleutians East" "Aleutians East" "Aleutians East" "Aleutians East" ...
##  $ Generator ID                                                           : chr  "1" "2" "3" "5" ...
##  $ Technology                                                             : chr  "Petroleum Liquids" "Petroleum Liquids" "Petroleum Liquids" "Petroleum Liquids" ...
##  $ Prime Mover                                                            : chr  "IC" "IC" "IC" "IC" ...
##  $ Unit Code                                                              : chr  NA NA NA NA ...
##  $ Ownership                                                              : chr  "S" "S" "S" "S" ...
##  $ Duct Burners                                                           : chr  "X" "X" "X" "X" ...
##  $ Can Bypass Heat Recovery Steam Generator?                              : chr  "X" "X" "X" "X" ...
##  $ RTO/ISO LMP Node Designation                                           : chr  NA NA NA NA ...
##  $ RTO/ISO Location Designation for Reporting Wholesale Sales Data to FERC: chr  NA NA NA NA ...
##  $ Nameplate Capacity (MW)                                                : chr  "0.9" "0.9" "0.5" "0.7" ...
##  $ Nameplate Power Factor                                                 : chr  "0.8" "0.8" "0.8" "0.8" ...
##  $ Summer Capacity (MW)                                                   : chr  "0.4" "0.3" "0.3" "0.4" ...
##  $ Winter Capacity (MW)                                                   : chr  "0.4" "0.3" "0.3" "0.3" ...
##  $ Minimum Load (MW)                                                      : chr  "0.4" "0.3" "0.3" "0.3" ...
##  $ Uprate or Derate Completed During Year                                 : chr  "N" "N" "N" "N" ...
##  $ Month Uprate or Derate Completed                                       : chr  NA NA NA NA ...
##  $ Year Uprate or Derate Completed                                        : chr  NA NA NA NA ...
##  $ Status                                                                 : chr  "SB" "OP" "OA" "OA" ...
##  $ Synchronized to Transmission Grid                                      : chr  "Y" "X" "X" "X" ...
##  $ Operating Month                                                        : chr  "12" "12" "12" "12" ...
##  $ Operating Year                                                         : chr  "2000" "2000" "2010" "2000" ...
##  $ Planned Retirement Month                                               : chr  NA NA NA NA ...
##  $ Planned Retirement Year                                                : chr  NA NA NA NA ...
##  $ Associated with Combined Heat and Power System                         : chr  "N" "N" "N" "N" ...
##  $ Sector Name                                                            : chr  "IPP Non-CHP" "IPP Non-CHP" "IPP Non-CHP" "IPP Non-CHP" ...
##  $ Sector                                                                 : num  2 2 2 2 2 2 1 1 1 1 ...
##  $ Topping or Bottoming                                                   : chr  "X" "X" "X" "X" ...
##  $ Energy Source 1                                                        : chr  "DFO" "DFO" "DFO" "DFO" ...
##  $ Energy Source 2                                                        : chr  NA NA NA NA ...
##  $ Energy Source 3                                                        : chr  NA NA NA NA ...
##  $ Energy Source 4                                                        : chr  NA NA NA NA ...
##  $ Energy Source 5                                                        : chr  NA NA NA NA ...
##  $ Energy Source 6                                                        : chr  NA NA NA NA ...
##  $ Startup Source 1                                                       : chr  NA NA NA NA ...
##  $ Startup Source 2                                                       : chr  NA NA NA NA ...
##  $ Startup Source 3                                                       : chr  NA NA NA NA ...
##  $ Startup Source 4                                                       : chr  NA NA NA NA ...
##  $ Solid Fuel Gasification System?                                        : chr  NA NA NA NA ...
##  $ Carbon Capture Technology?                                             : chr  NA NA NA NA ...
##  $ Turbines or Hydrokinetic Buoys                                         : chr  NA NA NA NA ...
##  $ Time from Cold Shutdown to Full Load                                   : chr  "10M" "10M" "10M" "10M" ...
##  $ Fluidized Bed Technology?                                              : chr  NA NA NA NA ...
##  $ Pulverized Coal Technology?                                            : chr  NA NA NA NA ...
##  $ Stoker Technology?                                                     : chr  NA NA NA NA ...
##  $ Other Combustion Technology?                                           : chr  NA NA NA NA ...
##  $ Subcritical Technology?                                                : chr  NA NA NA NA ...
##  $ Supercritical Technology?                                              : chr  NA NA NA NA ...
##  $ Ultrasupercritical Technology?                                         : chr  NA NA NA NA ...
##  $ Planned Net Summer Capacity Uprate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Net Winter Capacity Uprate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Uprate Month                                                   : chr  NA NA NA NA ...
##  $ Planned Uprate Year                                                    : chr  NA NA NA NA ...
##  $ Planned Net Summer Capacity Derate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Net Winter Capacity Derate (MW)                                : chr  NA NA NA NA ...
##  $ Planned Derate Month                                                   : chr  NA NA NA NA ...
##  $ Planned Derate Year                                                    : chr  NA NA NA NA ...
##  $ Planned New Prime Mover                                                : chr  NA NA NA NA ...
##  $ Planned Energy Source 1                                                : chr  NA NA NA NA ...
##  $ Planned New Nameplate Capacity (MW)                                    : chr  NA NA NA NA ...
##  $ Planned Repower Month                                                  : chr  NA NA NA NA ...
##  $ Planned Repower Year                                                   : chr  NA NA NA NA ...
##  $ Other Planned Modifications?                                           : chr  NA NA NA NA ...
##  $ Other Modifications Month                                              : chr  NA NA NA NA ...
##  $ Other Modifications Year                                               : chr  NA NA NA NA ...
##  $ Multiple Fuels?                                                        : chr  "N" "N" "N" "N" ...
##  $ Cofire Fuels?                                                          : chr  NA NA NA NA ...
##  $ Switch Between Oil and Natural Gas?                                    : chr  NA NA NA NA ...
##  $ Retirement Month                                                       : chr  NA NA NA NA ...
##  $ Retirement Year                                                        : chr  NA NA NA NA ...
##  - attr(*, ".internal.selfref")=<externalptr>
Get the right data types
# Convert all columns which are labelled with 'Plant Code', has 'MW', 'Year', or 'Month' to numeric data type.
num_cols <-  colnames(df)[grep("MW|Plant Code|Year|Month|Factor|Buoys", colnames(df), ignore.case=T)]
# Trick I showed last class to apply something over a bunch of columns. See the alternative there if you don't understand this
df[, (num_cols):= lapply(.SD, as.numeric), .SDcols = num_cols]
## Warning in lapply(.SD, as.numeric): NAs introduced by coercion
# The above 2 lines are *VERY FANCY*, I explained them in recitation 3. 
# You can also convert the columns one by one using code like this, which may be easier to understand
df[,`Nameplate Capacity (MW)`:=as.numeric(`Nameplate Capacity (MW)`)]
Change the column names for easy access
# Print the columns of df so I know what I am looking for
colnames(df)
##  [1] ".id"                                                                    
##  [2] "Utility ID"                                                             
##  [3] "Utility Name"                                                           
##  [4] "Plant Code"                                                             
##  [5] "Plant Name"                                                             
##  [6] "State"                                                                  
##  [7] "County"                                                                 
##  [8] "Generator ID"                                                           
##  [9] "Technology"                                                             
## [10] "Prime Mover"                                                            
## [11] "Unit Code"                                                              
## [12] "Ownership"                                                              
## [13] "Duct Burners"                                                           
## [14] "Can Bypass Heat Recovery Steam Generator?"                              
## [15] "RTO/ISO LMP Node Designation"                                           
## [16] "RTO/ISO Location Designation for Reporting Wholesale Sales Data to FERC"
## [17] "Nameplate Capacity (MW)"                                                
## [18] "Nameplate Power Factor"                                                 
## [19] "Summer Capacity (MW)"                                                   
## [20] "Winter Capacity (MW)"                                                   
## [21] "Minimum Load (MW)"                                                      
## [22] "Uprate or Derate Completed During Year"                                 
## [23] "Month Uprate or Derate Completed"                                       
## [24] "Year Uprate or Derate Completed"                                        
## [25] "Status"                                                                 
## [26] "Synchronized to Transmission Grid"                                      
## [27] "Operating Month"                                                        
## [28] "Operating Year"                                                         
## [29] "Planned Retirement Month"                                               
## [30] "Planned Retirement Year"                                                
## [31] "Associated with Combined Heat and Power System"                         
## [32] "Sector Name"                                                            
## [33] "Sector"                                                                 
## [34] "Topping or Bottoming"                                                   
## [35] "Energy Source 1"                                                        
## [36] "Energy Source 2"                                                        
## [37] "Energy Source 3"                                                        
## [38] "Energy Source 4"                                                        
## [39] "Energy Source 5"                                                        
## [40] "Energy Source 6"                                                        
## [41] "Startup Source 1"                                                       
## [42] "Startup Source 2"                                                       
## [43] "Startup Source 3"                                                       
## [44] "Startup Source 4"                                                       
## [45] "Solid Fuel Gasification System?"                                        
## [46] "Carbon Capture Technology?"                                             
## [47] "Turbines or Hydrokinetic Buoys"                                         
## [48] "Time from Cold Shutdown to Full Load"                                   
## [49] "Fluidized Bed Technology?"                                              
## [50] "Pulverized Coal Technology?"                                            
## [51] "Stoker Technology?"                                                     
## [52] "Other Combustion Technology?"                                           
## [53] "Subcritical Technology?"                                                
## [54] "Supercritical Technology?"                                              
## [55] "Ultrasupercritical Technology?"                                         
## [56] "Planned Net Summer Capacity Uprate (MW)"                                
## [57] "Planned Net Winter Capacity Uprate (MW)"                                
## [58] "Planned Uprate Month"                                                   
## [59] "Planned Uprate Year"                                                    
## [60] "Planned Net Summer Capacity Derate (MW)"                                
## [61] "Planned Net Winter Capacity Derate (MW)"                                
## [62] "Planned Derate Month"                                                   
## [63] "Planned Derate Year"                                                    
## [64] "Planned New Prime Mover"                                                
## [65] "Planned Energy Source 1"                                                
## [66] "Planned New Nameplate Capacity (MW)"                                    
## [67] "Planned Repower Month"                                                  
## [68] "Planned Repower Year"                                                   
## [69] "Other Planned Modifications?"                                           
## [70] "Other Modifications Month"                                              
## [71] "Other Modifications Year"                                               
## [72] "Multiple Fuels?"                                                        
## [73] "Cofire Fuels?"                                                          
## [74] "Switch Between Oil and Natural Gas?"                                    
## [75] "Retirement Month"                                                       
## [76] "Retirement Year"
# Rename these for easy access
df[,upyear := `Operating Year`]
df[,downyear := `Retirement Year`]
df[,capacity :=`Nameplate Capacity (MW)`]
Get a sum of new and retiring capacity year on year (new variable) by technology
# Set a 'key' for df
# In response to Gabriel's question, I don't explicitly actually use this line
# But I will set keys later that I do use!
setkey(df, upyear, Technology)
# View the key - this is just a list of columns for easy access.
key(df)
## [1] "upyear"     "Technology"
# One-line data.table query. by parameter:
# i = choose only the rows which have we have a year for which they come online
# j = create a column called year which is equal to upyear (this is to join with subtract later), and a column called 'capacity additions' which is the sum of capacities.
# by = I don't want to sum across the whole dataset for capacities, I only want to do so in groups, which is the operating year-technology pair.  
add <- df[!is.na(upyear),.(year = upyear,capacity_additions = sum(capacity,na.rm = T)), by = .(upyear, Technology)]

# Same code, I could have done the same thing using the key
add <- df[!is.na(upyear),.(year = upyear,capacity_additions = sum(capacity,na.rm = T)), by = key(df)]

# Same as making add, only I'm grouping the power plants by the years they come offline rather than online. 
subtract <-  df[!is.na(downyear),.(year = downyear,capacity_subtractions = sum(capacity,na.rm = T)), by = .(downyear, Technology)]
# view em
add
subtract
Join the additions and subtractions
## Use expand_grid to make sure that there's a key for every year
## (this is easy to overlook until you plot)
## The purpose of this is so you don't have NAs in your dataset - I want solar in 1893 to be 0, not NA, but this wouldn't be given by the add or subtract data.tables. 
# check ?expand_grid
# This is called an 'index' or 'key'
cap <- as.data.table(expand_grid('year' = min(add[,year]):max(add[,year]), 'Technology' = unique(add[,Technology])))
## Check out what this looks like
head(cap)
tail(cap)
# The purpose of the key is so that things can join later. Merges and Joins in data,table require that keys are the same so it knows which rows correspond to which other rows in two different datasets. 

## Set keys - note that it's year-Technology now so that the two tables match, instead of operating_year-technology and retirement_year-technology pairs. 
setkey(add, year, Technology)
setkey(subtract, year, Technology)
# Set the key to be the same as (capacity) additions and subtractions above
setkey(cap, year, Technology)


## Sense check - this commented-out code was for an early version of the dataset where there were some 0s in operating year
# add <- add[upyear !=0] 
## Use expand_grid to make sure that there's a key for every year
## (this is easy to overlook until you plot)
# cap <- as.data.table(expand_grid('year' = min(add[,year]):max(add[,year]), 'Technology' = unique(add[,Technology])))
# setkey(cap, year, Technology)

## now merge in the additions and subtractions
# I'm merging the additions and subtractions, all and all.x is equivalent to an outer join which means I keep ALL rows across the two tables. If I didn't do this I'd get back only those columns for which a technology-year pair had both power plants retiring and coming online.
add_subtract <-merge(add, subtract, all = TRUE)
# now I'm merging them with the index that I've defined. I'm doing a left-join here with all.x = TRUE, which means I only want to keep the index of the left dataset (my first parameter, in this case cap). This means that any rows in add_subtract without a corresponding index in the key of cap would be discarded. 
cap <-  merge(cap, add_subtract,all.x = TRUE)
# View the top 50 rows. There's a lot of NAs since many technologies did not exist in 1891.
head(cap,50)

Plot total capacity over time. How has this changed?

Total capacity grows over time a lot! It seems to follow an S-curve until the year 2000 from 0 to 800GW, and then jumps up again by 1.5x.

## So if there were no additions or subtractions in a year of a technology, set to 0.
cap[is.na(capacity_additions), capacity_additions:= 0]
cap[is.na(capacity_subtractions), capacity_subtractions:= 0]
# Calculate the net capacity change = additions - subtractions
cap_final <- cap[,.(net_capacity_change = capacity_additions-capacity_subtractions), by = key(cap)]
# Take a cumulatie sum over the years by technology.  Note here that because year is the first value of my key, the cumulative sum knows to sum over it. 
cap_final[, capacity := cumsum(net_capacity_change), by = Technology]
cap_final
cap_total <-  cap_final[, .(total_capacity = sum(capacity)), year ]

ggplot(cap_total, aes(x = year, y = `total_capacity`/1000)) + 
  geom_line() + 
  scale_y_continuous(name = "Total Capacity (GW)", limits = c(0,1250)) + 
  ggtitle("3b) Total Capacity over time")

tail(cap_total)

Plot the share of capacity from the different sources over time. How has this changed?

In the next 3 code blocks I spend a bunch of time matching colors to different fuels and aggregating the technologies to form less granular, more high-level groups. This wasn’t necessary, but it was good to make the colors intuitive for a reader.

It was not necessary to filter the dataset to 1973 until the last step, but it tells an exciting story if you don’t! In the pre 1910s, everything was hydroelectric. Then fossil fuels took off in the 1940s, and nuclear followed in the 1960s. If you take the stacked bar graph as a proportion, the minimum renewable % seems to have been in about 2005 which is surprising! and wind and solar only started recently, now together comprising about 10% of capacity. Most peole won’t know off the top of their head what combined cycle is - so if they explained this this was good!

# Change tech to a factor so that categories work for plotting
cap_final[,Technology:= as.factor(Technology)]
# view it
tail(cap_final)
# Set colors to use for plotting graphs
## Set high level groups
color_key = list('other' = 'coral',
                 "coal" = "chocolate4",
                 'petroleum' = "black",
                 'gas' = "gray79",
                 'combined cycle' = 'gray22',
                 'waste' = 'red4',
                 'nuclear' = 'lightgoldenrod',
                 'hydro' = 'cornflowerblue',
                 'wind' = 'aliceblue',
                 'solar' = 'gold',
                 'geothermal' = 'brown2'
                 )
## Get a list of technologies
techs <-  unique(cap_final$Technology)
## Make a hash lookup for the set of 27 techs
colors <-  rep('coral', length(techs))
colors <-  setNames(colors, techs)
## Aggregate by types
cap_final[, tech := "other"]
cap_final[, color := "coral"]
## loop through the big categories and search for them in the technologies.
for (fuel in names(color_key)){
  # Make a second key for the 27 techs
  colors[grep(fuel,techs,ignore.case = TRUE)] = color_key[fuel]
  # add a column with the 10 techs
  cap_final[grep(fuel,Technology,ignore.case = TRUE), tech:=fuel]
  # add a column for the color
  cap_final[grep(fuel,Technology,ignore.case = TRUE), color:=color_key[fuel]]
}

# change tech to a factor(categories)
cap_final[,tech:= as.factor(tech)]
# reorder in order of pollution levels
cap_final[,tech := factor(cap_final[,tech],levels = names(color_key))]
# Change tech also to a factor
cap_final[,Technology:= as.factor(Technology)]
# Make a lookup from tech group to technology
key <-  unique(cap_final[,.(tech,Technology)])
setkey(key, tech)
# Change data types to factors then reset key
cap_final[,Technology := factor(cap_final[,Technology],levels = key[,Technology])]
cap_final[,color:= as.factor(color)]
setkey(cap_final, tech, Technology, year)
# Aggregate on the Technologies to small techs
cap_agg = cap_final[,.(capacity = sum(capacity)), , .(tech,year, color)]


# Unaggregated Technology stacked area chart
ggplot(cap_final, aes(x = year, y= capacity, fill = Technology)) +
  geom_area(position = 'stack') +
  theme(legend.text = element_text(size = 8)) +
  guides(fill = guide_legend(ncol = 1)) +
  ylab("Nameplate Capacity (MW)") +
  ggtitle("3c) United States Electricity Generation Capacity by Technology and Year", subtitle = "source: EIA-860 data")

  # scale_fill_manual(values = colors)
ggsave(file.path(output_folder, 'stacked_capacity_all.pdf'), width = 16, height = 9)

# Unaggregated Technology stacked area chart as a proportion of total capacity
ggplot(cap_final, aes(x = year, y= capacity, fill = Technology)) +
  geom_area(position = 'fill') +
  theme(legend.text = element_text(size = 8)) +
  guides(fill = guide_legend(ncol = 1)) +
  ylab("Nameplate Capacity (% of total)") +
  ggtitle("United States Electricity Generation Capacity by Technology and Year", subtitle = "source: EIA-860 data")

  # scale_fill_manual(values = colors) +
ggsave(file.path(output_folder, 'proportion_capacity_all.pdf'), width = 16, height = 9)

# Aggregated Technology - color codedd
ggplot(cap_agg, aes(x = year, y= capacity, fill = tech)) +
  geom_area(position = 'stack') +
  theme(legend.text = element_text(size = 8)) +
  guides(fill = guide_legend(ncol = 1)) +
  ylab("Nameplate Capacity (MW)") +
  ggtitle("United States Electricity Generation Capacity by Technology and Year", subtitle = "source: EIA-860 data") +
  scale_fill_manual(values = color_key)

ggsave(file.path(output_folder, 'stacked_capacity_agg.pdf'), width = 16, height = 9)

# Aggregated Technology as % - color coded
ggplot(cap_agg, aes(x = year, y= capacity, fill = tech)) +
  geom_area(position = 'fill') +
  theme(legend.text = element_text(size = 8)) +
  guides(fill = guide_legend(ncol = 1)) +
  scale_fill_manual(values = color_key) +
  ylab("Nameplate Capacity (% of total)") +
  ggtitle("United States Electricity Generation Capacity by Technology and Year", subtitle = "source: EIA-860 data")

ggsave(file.path(output_folder, 'proportion_capacity_agg.pdf'), width = 16, height = 9)

Add total carbon emissions to the graph. Is there any correlation?

# Unaggregated Technology stacked area chart as a proportion of total capacity

em_total
# Play around with this scaling factor

coeff =  2600
# Plot capfinal
ggplot(cap_agg[year >= 1973]) +
  geom_area(position = 'fill',aes(year, capacity, fill= tech)) + # This is the area chart
  geom_line(data =em_total,aes(x=  year, y = `Total Emissions`/coeff), color = 'red') + # This is the line chart
  scale_fill_manual(values = color_key) +
  scale_y_continuous(name = "Capacity (% of total)", # Sets the first axis' name
                     sec.axis = sec_axis(~.*coeff/1000, name="Emissions (Billions of Metric Tons)")) + # This is the second axis.
  theme(legend.text = element_text(size = 8)) + # Make the font size smaller
  guides(fill = guide_legend(ncol = 2)) + # Make sure there's only one column in the legend
  ylab("Nameplate Capacity (% of total)") + # Set the ylabel
  ggtitle("3c) United States Electricity Generation Capacity by Technology and Year", subtitle = "source: EIA-860 data") # Set the tile and subtitle

cap_agg
# Plot capfinal
coeff = 2
# Aggregated Technology - color codedd
ggplot(cap_agg[year >= 1973], aes(x = year, y= capacity/1000, fill = tech)) +
  geom_area(position = 'stack') +
  theme(legend.text = element_text(size = 8)) +
  guides(fill = guide_legend(ncol = 1)) +
  ylab("Nameplate Capacity (GW)") +
  ggtitle("United States Electricity Generation Capacity by Technology and Year", subtitle = "source: EIA-860 data") +
  scale_fill_manual(values = color_key) +
  geom_line(data =em_total,aes(x=  year, y = `Total Emissions`/coeff, fill = NULL), color = 'red') + # This is the line chart
  scale_y_continuous(name = "Capacity (GW)", # Sets the first axis' name
                     sec.axis = sec_axis(~.*coeff/1000, name="Emissions (Billions of Metric Tons)")) + # This is the second
ggsave(file.path(output_folder, 'stacked_capacity_agg.pdf'), width = 16, height = 9)

Question 4

Suppose you are interested in the causal effect of coal capacity on carbon emissions and you want to use the dataset you just build to estimate it.

Run a regression of carbon emissions on coal capacity and present your results using the stargazer package. Use robust standard errors.

This should be straightforward - filter, merge, and then run the regression.

# First we need to join the two datasets
coal <- cap_agg[tech == "coal"]
# Setting the key means it knows which rows to merge
setkey(coal, year)
# 
coal <- merge(coal, em_total)

# Make the fit the base r way
fit_1 <- lm(`Total Emissions` ~ capacity, data = coal)
# These are the standard errors that should be reported, but see starprep for the next ways.  
lm_robust(`Total Emissions` ~ capacity, data = coal)
##                Estimate   Std. Error  t value     Pr(>|t|)     CI Lower
## (Intercept) 5.56559e+02 1.504316e+02 3.699749 5.854190e-04 2.535743e+02
## capacity    5.10658e-03 5.522671e-04 9.246576 5.703112e-12 3.994257e-03
##                 CI Upper DF
## (Intercept) 8.595437e+02 45
## capacity    6.218903e-03 45
# use starprep (from the estimatr library) to uses lm_robust defaults for standard errors (robust HC2 SEs are the default in both lm_robust). 
# See https://declaredesign.org/r/estimatr/articles/regression-tables.html#stargazer
# Then compare the two regressions.
stargazer(fit_1, se = starprep(fit_1), type = 'text')
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                          `Total Emissions`     
## -----------------------------------------------
## capacity                     0.005***          
##                               (0.001)          
##                                                
## Constant                    556.559***         
##                              (150.432)         
##                                                
## -----------------------------------------------
## Observations                    47             
## R2                             0.566           
## Adjusted R2                    0.556           
## Residual Std. Error      198.178 (df = 45)     
## F Statistic           58.666*** (df = 1; 45)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Does the previous regression capture the causal effect of coal capacity on carbon emissions? Why? If there is bias, do you expected to go in a particular direction? Explain.

We do expect to see the effect of coal capacity on emissions, because we understand a causal mechanism through which capacity affects emissions - through generation. So the regression captures some causal effect. The explanation of causality requires a mechanism, just because it’s biased does not mean it’s not causal.

On the other hand, identification of the causal effect does depend on bias. We call this ‘establishing causality’. In this case, we have a bunch of other fuels which bias the result - other fuels like natural gas also produces emissions, and these also increase total emissions. This means that the coefficient is upward biased (it is too high!).

Are there any controls you could add to correct this problem? Explain. You don’t have to run more regressions, just to discuss how you would do it assuming you had access to typically available data.

You could add a controls for all of the other fuels, fuel prices, demand etc for things that affect coal generation.